feat(core): bump DataFusion to 49.0.1 by goldmedal · Pull Request #1293 · Canner/wren-engine

goldmedal · 2025-08-21T10:29:44Z

Description

Upgrade the base DataFusion version to 49.0.1
This PR points to support accessing the element of struct array using dot datafusion#5

Summary by CodeRabbit

Bug Fixes
- Preserve span/metadata during query rewrites; adapt literal/limit and datetime handling for stable parsing and planning.
Refactor
- Simplified SQL translation paths, adjusted SQL formatting (join style, explicit casts, aliasing) and removed several analyzer/optimizer rule steps.
Chores
- Bumped multiple dependencies and fixed repository URL; updated runtime dependency versions.
Tests
- Updated numerous test expectations and snapshots, adjusted function-count expectations, and temporarily disabled a flaky view filter test.

coderabbitai · 2025-08-21T10:29:51Z

Walkthrough

Adapts code to upstream DataFusion API/AST changes (Literal/ValueWithSpan/ObjectNamePart/FieldRef/ScalarFunctionArgs), preserves spans/metadata through logical-plan rewrites, adjusts MDL analyzer/optimizer rule pipelines and dialect translation, bumps workspace dependencies, and updates tests and sqllogictest expectations.

Changes

Cohort / File(s)	Summary
AST shape & span preservation `wren-core-py/src/context.rs`, `wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs`, `wren-core/core/src/logical_plan/utils.rs`, `wren-core/core/src/logical_plan/analyze/model_anlayze.rs`, `wren-core/core/src/mdl/dialect/inner_dialect.rs`	Updated pattern matches to new AST shapes (e.g., `Expr::Literal(_, _)`, `ValueWithSpan`); preserve `spans`/`metadata` when rewriting `Subquery`/`Alias`; adapt join field rename (`null_equals_null` → `null_equality`).
MDL context: imports & rule pipelines `wren-core/core/src/mdl/context.rs`	Changed `CatalogProvider` import path; `ViewTable::try_new` → `ViewTable::new`; removed several analyzer/optimizer rules from pipelines (ExpandWildcardRule, InlineTableScan, UnwrapCastInComparison).
UDF & datatypes API updates `wren-core/core/src/mdl/function.rs`	Introduced/used `FieldRef`; changed scalar UDF API to `invoke_with_args(ScalarFunctionArgs)` and window UDF `field` to return `FieldRef`; adjusted return-field construction and tests (list element renamed `item` and made nullable).
Dialect refactor & SQL generation simplification `wren-core/core/src/mdl/dialect/wren_dialect.rs`, `wren-core/core/src/mdl/dialect/utils.rs`	Removed custom SQL helpers for arrays/named_structs and now delegate to inner dialect; construct `ObjectName` using `ObjectNamePart::Identifier(Ident)`; adjusted imports and identifier-quoting logic.
Workspace / manifests `wren-core/Cargo.toml`, `wren-core-py/Cargo.toml`	Fixed malformed `repository` URL; bumped `async-trait`, `datafusion` branch, and `tokio` versions; bumped tokio in `wren-core-py`.
SQL generation & tests `wren-core/core/src/mdl/mod.rs`, `wren-core-py/tests/test_modeling_core.py`, `ibis-server/tests/conftest.py`, `ibis-server/tests/routers/v3/connector/local_file/test_functions.py`	Updated snapshot/test expectations: explicit `RIGHT OUTER JOIN`, many `CAST(... AS BIGINT)` additions, alias/derived-projection changes; adjusted expected function counts (tests updated from 276→283 and 283→290 where applicable); test constants updated.
sqllogictest adjustments `wren-core/sqllogictest/test_files/view.slt`	Commented-out a failing TR block, added a TODO about `push_down_filter`, preserved query/result as comments.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Frontend as SQL Frontend
  participant MDL as MDL Context
  participant Analyzer as Analyzer Pipeline
  participant Optimizer as Optimizer Pipeline
  participant DF as DataFusion AST

  User->>Frontend: submit SQL / request transformation
  Frontend->>MDL: build LogicalPlan (updated AST shapes)
  MDL->>Analyzer: analyze_rule_for_*()
  note right of Analyzer: analyzer pipeline changed (removed rules)
  Analyzer-->>MDL: analyzed LogicalPlan (spans/metadata preserved)
  MDL->>Optimizer: optimize_rule_for_unparsing()
  note right of Optimizer: optimizer pipeline changed (removed rules)
  Optimizer-->>MDL: optimized LogicalPlan
  MDL->>DF: translate to SQL/AST (uses ObjectNamePart, FieldRef, ValueWithSpan)
  DF-->>MDL: SQL/AST nodes
  MDL-->>Frontend: final SQL/plan

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix(core): simplify the available function lists #1131 — overlaps on function-list plumbing and test constant changes around available functions.
fix(core): expand the wildcard before Wren rewrite rules #1145 — modifies analyze-rule pipelines in mdl/context.rs, likely to conflict with removed/added analyzer rules.
deps(core): upgrade to datafusion 0.46.1 #1124 — dependency/manifest updates for DataFusion/async-trait that align with this PR’s upgrades.

Suggested reviewers

douenergy
wwwy3y3
onlyjackfrost

Poem

In burrows of bytes I nibble and hop,
Spans snug in pockets, no data will drop.
Rules trimmed like twigs, functions re-lined,
I hop through ASTs and tidy each bind.
Carrot of tests stashed — a rabbit-sized grind. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (13)

wren-core/core/src/mdl/dialect/inner_dialect.rs (3)
185-188: Good adaptation to Expr::Literal(_, _) + minor simplification

Matching the two-field Literal for Utf8/LargeUtf8 is correct for DF 49.x. You can simplify the return to avoid wrapping an Ok around a ?.
-            | Expr::Literal(ScalarValue::LargeUtf8(Some(s)), _) => {
-                Ok(self.datetime_field_from_str(s)?)
-            }
+            | Expr::Literal(ScalarValue::LargeUtf8(Some(s)), _) => {
+                self.datetime_field_from_str(s)
+            }
195-231: Consider stricter validation for WEEK(...) parsing

The WEEK(…) parser accepts any trailing characters after the closing ')' and doesn’t validate unmatched parentheses robustly. Not a blocker, but it can reduce surprising inputs.

Ensure end == s.len() - 1 after finding ')'.

Validate that start < end and there’s no extra content outside WEEK(…).

251-254: Typo in local variable name

uppsercase should be uppercase. Purely cosmetic.
-fn non_uppercase(sql: &str) -> bool {
-    let uppsercase = sql.to_uppercase();
-    uppsercase != sql
-}
+fn non_uppercase(sql: &str) -> bool {
+    let uppercase = sql.to_uppercase();
+    uppercase != sql
+}
wren-core/core/src/logical_plan/utils.rs (2)
50-52: List child field aligns with DF 49; add a short note

Switching to Field::new("item", data_type, true) matches Arrow/DF conventions (child name “item”, elements nullable). Good change. Consider a short doc comment here to avoid future regressions.

169-171: Potentially invalid Arrow time type mapping

Mapping "time" to DataType::Time32(TimeUnit::Nanosecond) is unconventional. In Arrow, nanoseconds typically use Time64(Nanosecond); Time32 is generally for Second/Millisecond. Not related to this PR, but worth correcting to avoid downstream kernel incompatibilities and surprising behavior in parquet/IPC.
- "time" => DataType::Time32(TimeUnit::Nanosecond), // chose the smallest time unit
+ "time" => DataType::Time64(TimeUnit::Nanosecond),
If you prefer 32-bit, use Second or Millisecond.
wren-core/core/src/mdl/dialect/utils.rs (1)

58-63: Correct ObjectName construction under new AST

Wrapping Ident in ObjectNamePart::Identifier is the right adjustment for DF 49.0.1. One minor improvement: if you have spans available upstream, consider threading them instead of Span::empty() to preserve diagnostics.

wren-core/sqllogictest/test_files/view.slt (1)

19-23: Don’t silently comment out failing logic tests—gate or track

Understandable to disable while fixing push_down_filter on views. Please either:

Add a tracking issue and reference it here, or

Use a harness-supported skip/ignore mechanism (if available) instead of comments, so the test remains visible in reports.

I can help convert this to a gated/ignored test if the framework supports it.

wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs (1)

146-154: Literal arm updated; consider adding tests for both Cast and TryCast

Updating to Expr::Literal(value, _) is correct. Consider adding unit tests that cover:

Cast/TryCast of timestamp literals across all four units

NamePreserver preserving aliases through simplification

I can scaffold tests under core/src/logical_plan/optimize/tests if helpful.

wren-core-py/src/context.rs (1)

251-252: Consider preserving numeric formatting semantics for newly introduced LIMIT.

You default the is flag to false for new limits. If your AST uses this flag to track formatting (e.g., integer vs decimal/long), consider setting it to a sensible default (likely false is fine) or making it configurable if downstream expects a specific representation.
wren-core/core/src/logical_plan/analyze/model_anlayze.rs (1)
715-726: Preserve Column spans when relation doesn’t belong to MDL.

In the non-MDL path, rewrite_column_qualifier currently rebuilds a Column with Spans::new(), which drops original spans while marking Transformed::no. Capture and pass through the original spans to avoid span loss.

Apply this minimal change to preserve spans without broad refactors:
-            Expr::Column(Column { relation, name, .. }) => {
-                if let Some(relation) = relation {
+            Expr::Column(Column { relation, name, spans, .. }) => {
+                if let Some(relation) = relation {
                     Ok(self.rewrite_column_qualifier(relation, name, alias_model))
                 } else {
@@
-                    Ok(Transformed::yes(ident))
+                    Ok(Transformed::yes(ident))
                 }
             }
and adjust rewrite_column_qualifier’s non-MDL branch to use the provided spans (pass it in or return Transformed::no with the original Column if available):
-        } else {
-            Transformed::no(Expr::Column(Column {
-                relation: Some(relation),
-                name,
-                spans: Spans::new(),
-            }))
-        }
+        } else {
+            // Prefer to return the original Expr::Column with its spans (if available)
+            Transformed::no(Expr::Column(Column {
+                relation: Some(relation),
+                name,
+                spans, // preserve existing spans
+            }))
+        }
If threading spans into rewrite_column_qualifier complicates the signature, alternatively decide the Transformed::no branch directly in map_column_and_rewrite_qualifier and return the original Expr::Column there.
wren-core/core/src/mdl/function.rs (2)
357-366: Window UDF field now returns FieldRef; derive return type from input_fields.

This matches the new API and avoids reliance on input_types. One suggestion: consider nullability propagation (e.g., SameAsInput should often inherit input nullability).

Example tweak:
-        Ok(Field::new(field_args.name(), return_type, false).into())
+        let nullable = match self.return_type {
+            ReturnType::SameAsInput => field_args.input_fields().first().map(|f| f.is_nullable()).unwrap_or(true),
+            _ => false,
+        };
+        Ok(Field::new(field_args.name(), return_type, nullable).into())
140-147: Support LargeList and FixedSizeList for SameAsInputFirstArrayElement.

Currently only List is handled; add LargeList and FixedSizeList to avoid not_impl_err when callers pass those array types.
-            ReturnType::SameAsInputFirstArrayElement => {
+            ReturnType::SameAsInputFirstArrayElement => {
                 if arg_types.is_empty() {
                     return not_impl_err!("No input type");
                 }
-                if let DataType::List(field) = &arg_types[0] {
-                    field.data_type().clone()
-                } else {
-                    return not_impl_err!("Input type is not array");
-                }
+                match &arg_types[0] {
+                    DataType::List(field) |
+                    DataType::LargeList(field) => field.data_type().clone(),
+                    DataType::FixedSizeList(field, _) => field.data_type().clone(),
+                    _ => return not_impl_err!("Input type is not array"),
+                }
             }
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
42-47: identifier_quote_style compiles Regex on every call; cache it.

Minor perf nit: compile the regex once statically.

For example:
-use regex::Regex;
+use regex::Regex;
+use once_cell::sync::Lazy;
+static IDENT_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"^[a-zA-Z_][a-zA-Z0-9_]*$").unwrap());
@@
-        let identifier_regex = Regex::new(r"^[a-zA-Z_][a-zA-Z0-9_]*$").unwrap();
-        if ALL_KEYWORDS.contains(&identifier.to_uppercase().as_str())
-            || !identifier_regex.is_match(identifier)
+        if ALL_KEYWORDS.contains(&identifier.to_uppercase().as_str())
+            || !IDENT_RE.is_match(identifier)
             || non_lowercase(identifier)
Ensure once_cell is available in the crate; otherwise use lazy_static.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 388a91a and 8a8389c.

⛔ Files ignored due to path filters (1)

wren-core-py/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

wren-core-py/src/context.rs (2 hunks)
wren-core/Cargo.toml (1 hunks)
wren-core/core/src/logical_plan/analyze/model_anlayze.rs (5 hunks)
wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs (2 hunks)
wren-core/core/src/logical_plan/utils.rs (2 hunks)
wren-core/core/src/mdl/context.rs (2 hunks)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1 hunks)
wren-core/core/src/mdl/dialect/utils.rs (2 hunks)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2 hunks)
wren-core/core/src/mdl/function.rs (4 hunks)
wren-core/core/src/mdl/mod.rs (21 hunks)
wren-core/sqllogictest/test_files/view.slt (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

wren-core/core/src/mdl/context.rs (2)

wren-core/core/src/mdl/mod.rs (1)

catalog (294-296)

wren-core-base/manifest-macro/src/lib.rs (1)

view (347-366)

🪛 Gitleaks (8.27.2)

wren-core/core/src/mdl/mod.rs

2345-2345: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

2351-2351: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: test
GitHub Check: ci
GitHub Check: cargo test (macos-aarch64)
GitHub Check: cargo test (win64)
GitHub Check: cargo check
GitHub Check: cargo test (macos)

🔇 Additional comments (22)

wren-core/core/src/logical_plan/utils.rs (1)

298-303: All Expr::Literal patterns confirmed updated
A repository-wide scan of every Expr::Literal usage shows all are now using two arguments ((_, _)), and there are no remaining single-argument patterns to update:

core/src/logical_plan/utils.rs: Expr::Literal(_, _)

core/src/logical_plan/optimize/simplify_timestamp.rs:

matches!(sub_expr.as_ref(), Expr::Literal(_, _))

Expr::Literal(value, _)

core/src/mdl/dialect/inner_dialect.rs: Expr::Literal(ScalarValue::Utf8(Some(s)), _) / Expr::Literal(ScalarValue::LargeUtf8(Some(s)), _)

No further changes are required.

wren-core/core/src/mdl/dialect/utils.rs (1)

21-21: Import of ObjectNamePart is required and correct

The import aligns with the new AST form for ObjectName segments. Looks good.

wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs (1)

135-141: Literal shape check updated correctly

The guard matches!(sub_expr.as_ref(), Expr::Literal(_, _)) is the right fix for DF 49. Behavior unchanged. Nice.

wren-core-py/src/context.rs (1)

33-33: Good: switched to ValueWithSpan to keep literal spans intact.

Using ValueWithSpan in the import set aligns with the PR-wide span/metadata preservation. No concerns.

wren-core/Cargo.toml (2)

11-11: Repository URL fix looks good.

16-16: Bump async-trait to 0.1.89: compatible and non-breaking here.

wren-core/core/src/logical_plan/analyze/model_anlayze.rs (4)

104-108: Robust Subquery matching with .. to accommodate new fields.

Good adaptation for additional Subquery fields (e.g., spans). This prevents brittle matches.

446-447: Updated Join field null_equality: aligned with DataFusion v49.

Field rename addressed correctly; functional behavior preserved.

596-611: Preserve Subquery spans on rewrite.

Carrying spans through the Subquery transformation is consistent with the PR goal of span/metadata preservation.

731-740: Alias now preserves metadata: good.

Retaining metadata in Alias during expression rewrites avoids losing documentation or hints attached upstream.

wren-core/core/src/mdl/function.rs (3)

1-1: Switch to FieldRef: correct for DataFusion v49 Arrow types.

Importing FieldRef and using it where needed aligns with upstream type changes.

228-231: Updated ScalarUDFImpl to invoke_with_args: correct trait adaptation.

Returning an internal_err ensures the bypass UDF is never executed at runtime, which is the intended design.

604-605: Test update for list element Field(name="item", nullable=true) is correct.

Matches DF v49 semantics; good coverage.

wren-core/core/src/mdl/dialect/wren_dialect.rs (1)

21-21: Narrowed imports and reliance on ALL_KEYWORDS/Expr: cleaner surface.

wren-core/core/src/mdl/context.rs (4)

17-17: Import path correctly updated for DataFusion 49.0.1

The CatalogProvider import path change from datafusion::catalog_common to datafusion::catalog aligns with the upstream DataFusion 49.0.1 reorganization.

303-303: ViewTable constructor change handled correctly

The switch from ViewTable::try_new to ViewTable::new is appropriate since the new API no longer returns a Result. This aligns with DataFusion 49.0.1 API changes where view creation is now infallible.

169-187: Simplified analyzer rules for local runtime

The removal of ExpandWildcardRule and InlineTableScan analyzer rules appears intentional based on the PR context. These rules are likely now handled differently in DataFusion 49.0.1 or are no longer necessary for the local runtime mode.

220-260: Ignore incorrect assertion about UnwrapCastInComparison removal

A search of the current codebase shows no references to UnwrapCastInComparison, so it was never part of the unparser pipeline and hasn’t been removed. The suggestion to verify its replacement isn’t applicable here.

Likely an incorrect or invalid review comment.

wren-core/core/src/mdl/mod.rs (4)

695-695: Test assertions updated for new SQL generation behavior

The test expectations now include explicit CAST(column AS BIGINT) for numeric comparisons and RIGHT OUTER JOIN instead of RIGHT JOIN. These changes reflect DataFusion's more explicit type casting in the generated SQL for better cross-database compatibility.

Also applies to: 707-707, 1173-1173

1739-1739: Numeric predicates now include explicit type casting

The generated SQL now includes CAST(column AS BIGINT) for integer comparisons (e.g., CAST(customer.c_custkey AS BIGINT) = 1). This ensures type safety and consistency across different database backends.

Also applies to: 1774-1774, 1823-1826, 1874-1877

2345-2345: False positive: Test data strings are not API keys

The static analysis tool incorrectly flagged the test strings 'Gura' and session_user as potential API keys. These are clearly test data values used in row-level access control testing scenarios.

Also applies to: 2351-2351

1173-1173: UNNEST test output includes new aliasing structure

The UNNEST query now generates a derived_projection alias with explicit column naming. This change improves SQL clarity and ensures proper column referencing in complex queries.

wren-core-py/src/context.rs

wren-core/Cargo.toml

wren-core/core/src/mdl/dialect/wren_dialect.rs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

ibis-server/tests/conftest.py (1)

14-14: Hard-coded function count updated to 283 — consider deriving dynamically to reduce brittleness

This constant tends to change with DataFusion bumps. If feasible, derive the baseline count at runtime (e.g., via an API call or a helper in tests) or centralize it in a single test utility to avoid manual updates in multiple places.

If you keep the constant, ensure all tests referencing DataFusion function counts have been updated consistently (search for “len(functions) ==” and “DATAFUSION_FUNCTION_COUNT” across the repo).
wren-core-py/tests/test_modeling_core.py (1)
139-139: Replace brittle exact counts with a dynamic calculation based on union of built-ins and CSV-defined functions

Exact numbers will drift with DataFusion releases. Compute the expected length as the union of:

base registry functions (no CSV)

names in tests/functions.csv

This keeps the assertion stable while still verifying the merge logic.

Apply this diff within the current test to remove hard-coded counts:
@@
-    functions = session_context.get_available_functions()
-    assert len(functions) == 290
+    functions = session_context.get_available_functions()
+    # Dynamically derive expected count: DataFusion registry ∪ CSV custom functions
+    base_functions = SessionContext(manifest_str, None).get_available_functions()
+    base_names = {f.name for f in base_functions}
+    import csv  # local import to keep test module scope clean
+    with open(path, newline="") as f:
+        csv_names = {row["name"] for row in csv.DictReader(f) if row.get("name")}
+    assert len(functions) == len(base_names | csv_names)
@@
-    functions = session_context.get_available_functions()
-    assert len(functions) == 283
+    functions = session_context.get_available_functions()
+    assert len(functions) == len(base_names)
If you prefer a dedicated helper, I can factor this into a small utility used by both this test and ibis-server so future bumps don’t require manual edits. Do you want me to open a follow-up PR for that?

Also applies to: 151-151

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8a8389c and 32c0bdf.

⛔ Files ignored due to path filters (1)

wren-core-py/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

ibis-server/tests/conftest.py (1 hunks)
wren-core-py/Cargo.toml (1 hunks)
wren-core-py/src/context.rs (2 hunks)
wren-core-py/tests/test_modeling_core.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

wren-core-py/src/context.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cargo test (amd64)
GitHub Check: ci
GitHub Check: cargo test (win64)

🔇 Additional comments (1)

wren-core-py/Cargo.toml (1)

22-22: Verify Tokio version & feature consistency in wren-core workspace

Please confirm the following based on the audit of all tokio declarations:

In wren-core/Cargo.toml (the workspace root):

tokio = { version = "1.46", features = ["rt", "rt-multi-thread", "macros"] }

In wren-core/core/Cargo.toml:

tokio = { workspace = true, features = ["rt", "rt-multi-thread", "macros"] }

In wren-core/benchmarks/Cargo.toml and wren-core/wren-example/Cargo.toml:

tokio = { workspace = true } (inherits features from the workspace root)

In wren-core/sqllogictest/Cargo.toml:

Under regular [dependencies]: tokio = { workspace = true } (inherits all features)

Under [[test]] deps: tokio = { workspace = true, features = ["rt-multi-thread"] }
– Note: this overrides the workspace’s full feature set and omits "macros". If your tests use #[tokio::test] or any Tokio macros, you’ll need to add "macros" here.

Separately, wren-core-py/Cargo.toml is not part of that workspace—it pins tokio = "1.46.0" with default features. If you intend for the Python-binding crate to share exactly the same version and feature flags, you can either add it to the workspace and switch to workspace = true or continue managing its dependency explicitly.

Optional refactoring recommendation:

Centralize the Tokio dependency under a [workspace.dependencies] section in the workspace-root wren-core/Cargo.toml (e.g. tokio = { version = "1.46.0", features = ["rt", "rt-multi-thread", "macros"] }). This will ensure all member crates (including tests) stay in sync without per-crate repetition.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

ibis-server/tests/routers/v3/connector/local_file/test_functions.py (2)

56-58: Fix stale inline comment to match the new expectation (429 → 427).

The inline comment still says 429 while the assertion expects 427. This will confuse future readers.

Apply this diff:

-    # 429 is the number of functions in `resources/function_list/duckdb.csv` file excluded the default functions in DataFusion.
-    assert len(result) == DATAFUSION_FUNCTION_COUNT + 427
+    # 427 is the number of functions in resources/function_list/duckdb.csv, excluding the default DataFusion functions.
+    assert len(result) == DATAFUSION_FUNCTION_COUNT + 427

52-58: Make the size assertion data-driven to avoid churn on every DataFusion bump.

Right now the test hard-codes “427”. You can compute the delta by comparing the CSV function names against the base function names returned by the API (with remote list unset). This keeps the test stable when DataFusion adds/removes built-ins or when the CSV changes.

Proposed refactor within this block:

-    config.set_remote_function_list_path(function_list_path)
-    response = await client.get(url=f"{base_url}/functions")
-    assert response.status_code == 200
-    result = response.json()
-    # 429 is the number of functions in `resources/function_list/duckdb.csv` file excluded the default functions in DataFusion.
-    assert len(result) == DATAFUSION_FUNCTION_COUNT + 427
+    # Capture the base set of function names from DataFusion alone
+    base_names = {f["name"] for f in result}
+    base_len = len(base_names)
+
+    # Enable remote function list and fetch again
+    config.set_remote_function_list_path(function_list_path)
+    response = await client.get(url=f"{base_url}/functions")
+    assert response.status_code == 200
+    result = response.json()
+
+    # Compute expected delta from the CSV by excluding names already present in DataFusion
+    csv_names = _read_duckdb_function_names(function_list_path)
+    expected_delta = len(csv_names - base_names)
+    assert len(result) == base_len + expected_delta

Add this small helper (outside this function, anywhere in the module), and update imports accordingly:

import csv
import os

def _read_duckdb_function_names(function_list_dir: str) -> set[str]:
    csv_path = os.path.join(function_list_dir, "duckdb.csv")
    with open(csv_path, newline="", encoding="utf-8") as f:
        return {row["name"] for row in csv.DictReader(f)}

This mirrors the server’s de-duplication by name and removes the magic number.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 32c0bdf and 2babd27.

📒 Files selected for processing (1)

ibis-server/tests/routers/v3/connector/local_file/test_functions.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: cargo test (macos)
GitHub Check: cargo test (macos-aarch64)
GitHub Check: cargo test (win64)
GitHub Check: cargo check
GitHub Check: test
GitHub Check: ci

douenergy · 2025-08-22T06:06:08Z

Thanks @goldmedal

goldmedal added 2 commits August 20, 2025 09:53

tmp

498a682

upgrade datafusion to 49.0.1

8a8389c

github-actions bot added core dependencies Pull requests that update a dependency file rust Pull requests that update Rust code python Pull requests that update Python code labels Aug 21, 2025

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

wren-core-py/src/context.rs Show resolved Hide resolved

wren-core/Cargo.toml Show resolved Hide resolved

wren-core/Cargo.toml Show resolved Hide resolved

wren-core/core/src/mdl/dialect/wren_dialect.rs Show resolved Hide resolved

goldmedal added 3 commits August 21, 2025 19:10

update datafusion version

92d764b

address comment

9734233

update number of datafusion

32c0bdf

github-actions bot added the ibis label Aug 22, 2025

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

fix function test

2babd27

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

goldmedal requested a review from douenergy August 22, 2025 03:39

douenergy approved these changes Aug 22, 2025

View reviewed changes

douenergy merged commit 92a1c7d into Canner:main Aug 22, 2025
15 checks passed

coderabbitai bot mentioned this pull request Sep 2, 2025

fix(core): introduce DATE_DIFF function #1303

Merged

coderabbitai bot mentioned this pull request Oct 2, 2025

chore(ibis): enhance the function list of BigQuery and MSSQL #1336

Merged

coderabbitai bot mentioned this pull request Oct 29, 2025

feat(core): support UNNEST syntax for Snowflake #1357

Merged

nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026

feat(core): bump DataFusion to 49.0.1 (Canner#1293)

836d09d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): bump DataFusion to 49.0.1#1293

feat(core): bump DataFusion to 49.0.1#1293
douenergy merged 6 commits intoCanner:mainfrom
goldmedal:feat/upgrade-datafusion-49.0.1

goldmedal commented Aug 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 21, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

douenergy commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goldmedal commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

douenergy commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goldmedal commented Aug 21, 2025 •

edited

Loading

coderabbitai bot commented Aug 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)